Basic concepts and tools for the Toki Pona minimalist and constructed language: Wordnet synsets; analysis of the vocabulary; synthesis and syntax highlighting of texts
نویسندگان
چکیده
A minimalist constructed language (conlang) is useful for experiments and comfortable for making tools. The Toki Pona (TP) conlang is minimalist both in the vocabulary (with only 14 letters and 124 words) and in the ≈ 10 syntax rules. The language is useful for being a used and somewhat established minimalist conlang with at least hundreds of fluent speakers. In this article, we describe current concepts and resources for TP, and make available Python scripted routines for the analysis of the language, the synthesis of texts, the specification of syntax highlighting schemes, and the achievement of a preliminary TP Wordnet [1]. We focus on the analysis of the basic vocabulary, as corpus analyses were found [2]. The synthesis is based on sentence templates, relates to context by keeping track of used words, and renders larger texts by using a fixed number of phonemes (e.g. for poems) and number of sentences, words and letters (e.g. for paragraphs). Syntax highlighting reflects morphosyntactic classes given in the official dictionary and different solutions are described and implemented in the well-established Vim text editor [3]. The tentative TP Wordnet is made available in three forms that reflect the choices of the synsets related to each word. In summary, this text holds potentially novel conceptualizations about, and tools and results in analyzing, synthesizing and syntax highlighting the TP language. keywords: Constructed languages, Natural Language Processing, Syntax highlighting, Wordnet, Toki Pona 1 ar X iv :1 71 2. 09 35 9v 1 [ cs .C Y ] 2 6 D ec 2 01 7
منابع مشابه
Automatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملLanguage Features of Russian Texts of Engineering Discourse
The Article is devoted to the applied problem of identifying the linguistic features of engineering texts. The study of Russian-language texts of engineering discourse is usually of an applied nature, in our case, this applied research is caused by the need to teach foreigners who receive professional engineering education in Russia and in Russian language. The object of the research is the Rus...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملAn Analytical Study of Synonymy in Assamese Language Using WorldNet: Classification and Structure
The present paper aims to categorize different types of synonymous words and also to highlight their synonymic pattern as well as grammatical categories found in Wordnet of Assamese language. Synonymy is an important component of vocabulary of the language. It establishes lexical relation between words. In fact, the term ‘synonymy’ is applied to the two or more words which share the same semant...
متن کاملReduplication in Persian Language and Literature
The Reduplications are made by repeating part of the base. The repeated part does not make sense and will never be used alone and is just popular in spoken language. In recent times, they have been used in some texts of poetry and prose, in particular, in stories written in vernacular. This research, with a historical approach, and with an analytical-explanatory method, examines the information...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1712.09359 شماره
صفحات -
تاریخ انتشار 2017